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Abstract — Context categorization is a fundamental pre-requisite for multi-domain multimedia content analysis applications in order to 
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through statistical descriptors that keep the underlying information. These extracted features offer a highly discriminant behavior for 
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1 Introduction 

THE importance of context is very well known in 
Content Based Image Retrieval (CBIR) [ l], (2). Many 
low-level features, based on shape, texture, color and 
other local descriptors broadly used in computer vision 
incur under multi-domain circumstances in a circular 
interdependence of feature extractors where a priori in- 
formation is needed to parametrize adequate feature 
extractors. A possible approach to try and reduce this 
dependency involves the exploitation of global image 
context characterization for semantic domain inference. 
This prior information on scene context can represent a 
valuable asset in computer vision for purposes ranging 
from regularization to the pre-selection of local primitive 
feature extractors (3). 

Novel semantic approaches that try to overcome 
the current existing limitation derived from fixed tax- 
onomies and manual annotations, rely on automatic or 
semiautomatic ingestion processes. These processes min- 
imize the semantic gap by introducing semantic middleware 
[4 1 layers based on a combination of: 

• explicit information provided by human made tax- 
onomies. 

• relevance feedback data and knowledge extracted 
from manual annotations. 

• implicit information obtained by data mining tech- 
niques through training processes. 

This is specially relevant for broad-domain data in- 
tensive multimedia retrieval activities like the news pro- 
duction in TV broadcasting sector or large-scale earth 
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observation archive navigation and exploitation. 
1.1 Related works 

Research contributions related to the approach proposed 
in this paper are outlined in this section. 

Local features have broadly been used for context cate- 
gorization [5 1, |6|. SIFT [7| and SURF |8| are among most 
popular choices in this respect. A two step approach 
for the efficient use of local features is proposed by 
several authors like Ravinovich et al. |9| and Choi et 
al. [To) . Olaizola et al. [TT| propose an architecture for 
hypothesis reinforcement based on an initial analysis of 
low-level features for context categorization and further 
hypothesis creation. This architecture can exploit context 
specific feature extractors to validate or refuse the initial 
context hypothesis. This stresses the value of global 
descriptors. 

Among different global descriptors like histograms of 
several local features [12], texture features, self similarity 
1 13], there are some specific algorithms in the literature 
which have shown a great potential: GIST |14), (15) is 
probably the one of the most popular ones. Watanabe et 
al. 1 16 ] propose a global descriptor based on the code- 
words provided by Lempel-Ziv [17|, |18 | entropy coders, 
exploiting the relationship between the complexity of 
an image and the context to which it may belong. The 
Ridgelet transform (19), (20), (21) has been successfully 
used as a global feature for image categorization and 
handwritten character recognition. In typical operational 
implementations, all these algorithms are typically com- 
bined with other global or local features. 

The trace transform has been already used for several 
computer vision applications. Indeed, a method based 
on this transform has been included in the MPEG- 
7 (22) st andard specification for image fingerprinting 
|23|, |24|. Other applications (mostly with monochrome 
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images) such as face recognition (25), (26), (27), (28), 
character recognition |29| and sign recognition |30| are 
some of this examples. The proposed approach based on 
a recursive application of the trace transform to reduce 
the dimensionality of the obtained feature space, offers 
an excellent performance for image fingerprinting, but 
does not offer good discriminative characteristics as a 
method for domain characterization due to the high data 
loss occurred in the diametrical and circus functionals 

ID 

The approach proposed by Liu and Wang [27| reduces 
the number of attributes using Principal Component 
Analysis (PCA) to select the most relevant coefficient 
and reduce the dimensionality of the feature space. 
However, this approach does not take into account the 
frequencial relationships among the different coefficients 
and increases the feature extraction complexity since it 
requires the covariance matrix information of all pre- 
vious samples. Moreover, the feature relevance of each 
individual DCT coefficient is too low and sensitive to 
noise and variations. 

In the following sections a new method for context 
categorization based on the use of the trace transform 
will be presented. This method provides higher discrim- 
inative characteristics at a very low dimensionality, a key 
factor for efficient retrieval in massive content databases 

This paper is organized as follows: In Section [2] a 
general overview of our proposed DITEC method is 
presented. In Section |2.1[ image pre-processing issues 
are addressed. The trace transform and its properties are 
analyzed in Section 2.2 Feature extraction process details 
are presented in Section 2.3 while the classification pro- 
cess is described in Section 12.41 The validation carried 
out with two different datasets is explained in Section 
[3] Finally, Section [I] concludes with a discussion of the 
results. 




Fig. 1 . DITEC System workflow 



2 General Description of the DITEC 

METHOD 

We introduce a hierarchical probabilistic model in terms 
of random variables D, I, T, E and C. The fundamental 
objective of DITEC is to derive an appropriate estimate 
C of the unknown global image semantic concept C 
from an observed data set D (Figure [lj. Geometric 
and radio /colorimetric indeterminacies are treated by 
introducing the concept of an unknown "clean" image 
/ whose parameters depend on the elementary scene 
descriptors T that depend on scene content E that in turn 
depends on context C. The conditional probabilistic links 
between the different layers in the workflow correspond 
to the main processing steps of the DITEC method. 
The four DITEC steps are thus the following: 

Sensor modeling: image acquisition and pre- 
processing (radiometric noise, color space, geomet- 
ric quantization and image lattice finiteness effects). 



Data transformation: Clean image contents in 
terms of the scene elements in I by means of a trace 
transform operation with T as outcome of the pro- 
cess. The result will depend on the chosen functional 
(e.g: ( p3| ) and on the selected geometric parameters 
(detailed in Section |2.2.3) . The outcome of the trace 
transform of an image is a two-dimensional signal 
composed of sinusoidal waves. The original image 
is represented in the resulting signal in terms of 
sinusoids with a particular amplitude, phase, fre- 
quency and intensity. This characterization process 
represents one of the key steps in the overall infor- 
mation extraction process. 

Feature extraction: summarization of the extracted 
features T, compressed and adapted into a man- 
ageable set E of object-based descriptors. The wave 
features contained in the resulting image must be 
characterized. In order to do this, the 2D trace signal 
Tfe is transformed to the frequency domain. To con- 
centrate the signal energy to the lowest spatial fre- 
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quencies, a two-dimensional DCT (Discrete Cosine 
Transform) is applied. Then, the DCT is compressed 
to a vector of two components (average value and 
kurtosis of all the orthogonal elements of the main 
diagonal, Figure [6}. This transformation considers 
the DCT space as representable by a superposition 
of Gaussian-shaped clusters. It aims at reducing the 
considered descriptor space dimensionality while 
preserving essential information in order to allow 
a good performance in the subsequent classification 
process. The last n values from the obtained data 
pair vector can be disregarded due to the empirical 
reason that given the low-pass filtering for most 
natural images the DCT concentrates the highest 
values in the lowest coefficients. 

Class assignment: vectors obtained in the previ- 
ous step are processed to improve the performance 
of classifiers in the defined feature space. All the 
obtained vectors are statistically analyzed to se- 
lect their most representative attributes. Then the 
supervised classification process is carried out to 
obtain an estimate C of the unknown global image 
semantic concept C. 
By applying the probability chain decomposition rule, 
the probability ([TJ of an asset to belong to a given class 
can be decomposed in terms of the different layers of 
the model. Estimates for Ci,Ej,TkIi, and D m are the 
obtained results for p(C\E), p(E\T), p(T\I) and p(I\D) 
processes, given the usual conditional independence as- 
sumptions implied by a hierarchical model: 



(1) 



p(Ci\D m )=p(C i \E j ,T k ,Ii,D m ) 

= p(Q|E J 0p(^|T fc )p(T fc |/Op(/^mM^m) 

where: 

< i < n c i asses 771 G N 

< j < oo leN 

< k < oo keN (2) 

<C / ^ ^orig .images J ' £ ^ 
<C TTL ^ ^orig .images 2 G N 

p(Ci\Ej) is the probability of the data mining processes 
to determine correctly the class to which the image 
belongs, given E as a set of features. The second element 
p(Ej\Tk) can be understood as p ( Tk \^rt E ^ following 
the Bayes' theorem. It shows that this model layer 
is linked to the information representativeness of the 
extracted features. p(Tk\Ii) implies the trace transform. 
It is a deterministic process with a slight denoising 
effect. The quality of data D m and the pre-processed 
Ii image will be fundamental for an effective feature 
extraction process. In fact, the joint inference/ estimation 
process depends on the trace transform which can be 
regarded as a data re-ordering— ^compression— ^feature 
space optimization process. 

2.1 Sensor modeling 

The first pre-processing step transforms the RGB color 
space into YCbC r |34|. The luminance channel (Y) will 




Fig. 2. Trace transform, geometrical representation 



be used as the most relevant channel to encode shape re- 
lated features. Color distribution information is encoded 
by processing the chrominance channels (Cb,C r ). 

In order to reduce effects introduced by radiometric 
noise, image lattice and quantizations low-pass filter is 
applied to each channel. 

HSV (34) color space information is encoded by ob- 
taining mean and variance values (/i, a) of the corre- 
sponding intensity distributions in each H,S,V channel. 
In the Attribute Selection process, this (/i, a) information 
is introduced into the obtained descriptor E. 



2.2 Data transformation 

The data transformation process is carried out through 
the trace transform, a generalization of the Radon trans- 
form ^ where the integral of the function is substituted 
for any other functional S (30), BTJ, (35), (36), (37). 



R(4>,p) = J J f(x, y)S(x cos <p + y sin <\> — p)dx dy (3) 

The trace transform consists in applying a functional S 
along a straight line (L in Figure^. This line is moved 
tangentially to a circle of radius p covering the set of 
all tangential lines defined by (j). The Radon transform 
has been used to characterize images (38) in well de- 
fined domains (39) , in image fingerprinting (40) and as 
a primitive feature for general image description. The 
trace transform extends the Radon transform by enabling 
the definition of the functional and thus enhancing the 
control on the feature space. These features can be set up 
to show scale, rotation /af fine transformation invariance 
or high discriminance for specific content domains. 

The outcome of the trace transform of a 2D image 
is another 2D signal composed by a set of sinusoidal 
shapes that vary in amplitude, phase, frequency, inten- 
sity and thickness. These sinusoidal signals encode the 
original image with a given level of distortion depending 
on the functional and quantization parameters. 
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2.2. 1 Functionals 

A functional S of a function £(x) evaluated along the 
line L will have different properties depending on the 
features of function £(x) (e.g.: invariance to rotation, 
translation and scaling (4lJ). Kadirov et al. |42| propose 
several functionals with different invariance or sensitive- 
ness properties. These invariant functionals have been 
used for expert systems for traffic sign recognition (30) , 
face authentication (26), (43) or fingerprinting j3lj pur- 
poses. 

2.2.2 Geometrical constraints 

The main parameter of the trace transform is the func- 
tional S while its properties will set the invariant behav- 
ior of the transform with respect to its invariance in the 
face of different image transformations. However, there 
are geometrical parameters that also have a strong effect 
on the results. These parameters are the three measures 
of resolution denoted by A</>, Ap, £(AL) for angle, radius 
and along the line L respectively. 

The final resolution of the image obtained through the 
trace transform will be defined by and n p where: 
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X and Y are the horizontal and vertical resolutions 
of the image. Equation ([7} shows a symmetrical result 
since the same lines are obtained for cj) G [0,7r] and 
(j) G [7r,27r]. However this is only true for functionals 
that are not considering the position (like the Radon 
transform). Depending on the selected functional and 
on the desired properties of the trace transform(e.g: 
rotational invariance), the ranges of <\> and p can be 
modified to: (j) G [0, it] or p G [0,r]. 

2.2.3 Quantization effects 

Digital images are affected by two main effects during 
trace transformation: 



min(X, Y) 



(5) 



with X and Y denoting the horizontal and vertical 
resolutions of the image l\. 

Low (n0,n p ,n^) values will have a non-linear down- 
sampling effect on the original image, where is de- 
fined as: 



1 

AL 



(6) 



The set of points used to evaluate each functional is 
described (assuming (0,0) as the center of the image) by: 



L — » y = 2psin(</>) 



tan(0) 



A singularity can be observed at (j) — and 
these cases it can be assumed that: 
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Fig. 3. Trace transform contribution mask at very high res- 
olution parameters (Image resolution:100x100px. = 

1000, np= 1000, = 5000). 

• some pixels might never be used by the functional 
given the geometrical setup of the transform, and to 
its integration nature. 

• there may be some pixels that have much higher 
cumulated effect than the others into the functional. 

These effects need to be taken into account in order to 
preserve of the results the homogeneity, avoiding pixels 
or areas with higher relevance than others. Even for very 
high (n0,n p ,n^) values in relation to the original image 



5 



resolution, the trace transform introduces a contribution 
intensity map that encodes the relevance of the different 
regions of the input picture. As shown in Figure |3j high 
resolution values of the trace transform parameters tend 
to create a convex contribution intensity map. Therefore, 
high parameter values do not necessarily imply optimal 
image content representation on the trace transform. 

High values of improve the rotational invariance 
of the trace transform (although in a manner that it is 
dependent on the selected functional) while very low 
values n$ < 5 cannot be considered as producing a 
valid trace transform since there is not enough angular 
information. 

Ideally, the trace transform should keep the following 
constraints (considering M as the matrix that contains 
the number of repetitions of each pixel during the trace 
transform): 

• Coverage All pixels of the image have to be in- 
cluded at least in one functional. min(M) > 0. 

• Homogeneity All pixels are used the same number 
of times. Var(M) . 

• High pixel repetition degree Each pixel has to be 
included in as many traces as possible (high values 
of mean(M)). 

TABLE 1 

Quantization effects of the trace transform 
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Table [T] shows some example values for coverage, 
homogeneity and repetition degree at different n^, n p , 
resolutions. Note that the best ratios are obtained for 
lower variations in </> since the angle is the main factor to 
increase the variance. The pixel repetition degree is also 
strongly conditioned by the angular resolution. This fact 
makes the main factor to balance the homogeneity 
and repetition degree (e.g: low repetition degrees show 
weaker rotational invariance). Once is set, n p can 
be adjusted to ensure the optimal coverage, has an 



almost asymptotic behavior once the other two parame- 
ters are set. Figure [5] shows some cases applied to a real 
image and the convex contribution intensity mask effect 
for moderate or higher values of n^. 



2.3 Feature extraction 

In order to reduce the set of descriptors that are needed 
to characterize the wave-like signal obtained from the 
trace transform, a DFT Discrete Fourier Transform (DFT) 
or Discrete Cosine Transform (DCT) can be applied. 
The DCT (44), which has become one of the most 
popular transforms for audio and image coding, has 
two main properties that make it more suitable than 
DFT for the feature extraction process: energy compaction 
and decorrelation (45) . The energy compaction means that 
the signal energy is accumulated in a small number of 
coefficients and that these coefficients are typically the 
lowest coefficients of the DCT transform. Taking into 
account that the trace transform does not introduce high 
frequencies into the transformed image, the DCT pro- 
vides a good method to efficiently represent the wave- 
like signal information contained in the resulting images. 
The decorrelation property of the DCT implies that there 
is a very low interdependency among the coefficients. 
This property matches with the common needs of a 
number of data mining algorithms whose performance 
has a strong dependency on input attribute correlation. 
Moreover, the coefficients obtained by applying a DCT 
are real values while the DFT provides coefficients in 
the complex domain. The DCT thus allows to encode 
information in lower dimensionality code spaces with 
better compaction characteristics. Moreover, from the 
computational cost point of view, there are efficient SW 
and HW algorithms for the implementation of the DCT 
that make it suitable for real time applications without 
high computing performance requirements. 

The 2D forward DCT is given by ( [12] ). In our case, 
instead of using the typical 8x8 macroblocks (which im- 
prove the coding speed but act as local descriptors), the 
transform will be applied to the whole image, keeping 
the global representativeness of the obtained coefficients. 
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2. 3. 1 Statistical descriptors 

As a consequence of the properties of the DCT and of 
the nature of the 2D signals resulting from the trace 
transform, the 2D DCT stores more energy in its lower 
frequencies. 

Figure [5] shows the process of trace transform evalua- 
tion and its 2D DCT where the intensity is quantized into 
5 different levels. The functional used has been the one 
enumerated by Srisuk et al. |26) as functional number 3 

G3- 



T(f(t)) 



(t - c) 2 f(t)dt 



t\f(t)\dt 



\f(t)\dt 
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In order to avoid this sensitivity to specific coefficients 
of the DCT and instead of the previously discussed PCA 
based approach, our proposed DITEC method for di- 
mensionality reduction is based on statistical parameters 
of the n first perpendicular straight lines to the main 
diagonal (Figure [§]>. These coefficients which correspond 
to similar frequency bands can be computed very effi- 
ciently. The distribution is represented by the mean value 
and the kurtosis of each vector. This pair of descriptors 
(/i, k) of the first element (corresponding to the DC value 



of the DCT) is substituted by the mean and variance of 
the original image in HSV space (Figure [lj. 

Equation (17} defines the kurtosis of a distribution 
which is represented by fl8} for a discrete set of ele- 
ments. 



k = 



E(x - /i) 4 



i=i 

n 

i=i 



(17) 
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Consider that the mean and kurtosis values encode 
the information of coefficients corresponding to approx- 
imately similar frequencies. The obtained dimensionality 
of the transformed (/i, k) pairs is given by ([19]). 



nDims 
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where n c is the number of channels of the original 
image and rif the number of features extracted from 
each vector (2 in the case of using [//,&]). Thus, the 
dimensionality reduction is given by ( [20] ). 



rf = 



(20) 



71/ 



For square resolutions and considering rif = 2 the 
reduction factor increases linearly with the resolution 
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Fig. 6. Conceptual scheme: DCT matrix transformation 
into /1, fe pair vector. 
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2.4 Classification 

After the feature extraction process explained in the pre- 
vious section, the dimensionality of resulting descriptors 
can be reduced by attribute selection strategies in order 
to improve the efficiency of subsequent classification 
steps. 

Considering machine learning as a set of techniques 
to discover and extract knowledge in an automated way 
(46) , the basic problem is concerned with the induction 
of a model that classifies a given object into one of 
several known classes. In order to induce the classifi- 
cation model, each element E described by a pattern of 
d features is simplified by applying the Feature Subset 
Selection (FSS) (47) approach. FSS can be reformulated 
as follows: given a set of candidate features, select the "best" 
subset in a classification problem. In our case, the "best" 
subset will be the one with the best predictive accuracy. 

Most of the supervised learning algorithms perform 
rather poorly when faced with many irrelevant or re- 
dundant (depending on the specific characteristics of 



the classifier) features. In this way, the FSS proposes 
additional methods to reduce the number of features 
so as to improve the performance of the supervised 
classification algorithm. 

2.4. 1 Feature Subset Selection in Machine Learning 

There are two main approaches to tackle the Feature Sub- 
set Selection (FSS) problem from the Machine Learning 
point of view [48 1, namely wrapper and filter methods. 

Wrapper approaches (49) try to identify the subset 
of variables that, given a classification paradigm and 
a dataset, provide the best classification function. The 
process consists on searching an optimal feature sub- 
space based on a performance measure (typically the 
accuracy, though other measures can be used). Each sub- 
set is evaluated by testing the performance of the chosen 
paradigm in the dataset, using only the variables in the 
subset for evaluation. The estimation of the performance 
of the classifiers requires a validation scheme, such as 
cross validation or bootstrap estimation. As a result, 
the evaluation of each subset involves the training and 
testing of several classification functions, increasing the 
computational time required for the FSS process. 

The filter approaches search for the best variable 
subset, independently of the classification paradigm, 
considering the relationship between the predicting vari- 
ables and the class, and occasionally the relationship 
among the predicting variables. One of the simplest 
approaches consists of ranking the variables according 
to their usefulness and selecting only those on the top 
of the ranking. The usefulness of a variable is measured 
univariately by means of different metrics. 

Once the features are ranked, a threshold must be set 
to obtain the final subset. The ranking methods are only 
concerned with the relevance of the features considered 
and, thus, they do not filter out redundant variables. 

The selected classifiers are briefly described below; a 
wrapper Feature Subset Selection has been used in this 
paper. 

For the supervised learning task, in the training set 
used to generate the classification model, for each x 
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sample its y label value is known. For this analysis, 
Bayesian Networks (50) and Support Vector Machines 
(SVM) (51) have been used. 

3 Experimental Results 

The presented method has been tested with 2 different 
datasets. The first of them (Corel 1000 |52|) is a stan- 
dard dataset which will allow the comparison of the 
obtained validation data with other methods existing in 
the literature. The second case (earth observation data), 
will be used to show the potential of the proposed 
method under diverse conditions. An a priori statistical 
data analysis together with a combination of classifiers 
has been adapted for each of the two corresponding 
validation case studies. 

3.1 Case study 1 : Corel 1000 dataset 

The Corel 1000 dataset is composed of 1000 images 
distributed in 10 classes (100 instances per class). The 
tags of the classes are: Africans, Beach, Architecture, Buses, 
Dinosaurs, Elephants, Flowers, Horses, Mountains and Food. 
Figure [7] shows one sample per each class. Even though 
they are semantically separated, visual similarities may 
be found among some of them. For example, people and 
trees can be found under Africa, Beach, and Mountain 
categories. 

The following parameters have been selected: = 
71, rip = 71, n c = 3,n/ = 2. This choice results in 15,123 
trace transform coefficients per image. By obtaining the 
mean values and kurtosis as described in the previous 
section, the number of attributes is reduced to 606 (by a 
factor of 25). 

Based on the fact that the DCT gathers signal en- 
ergy in the lower frequencies (see Figure [5c|, highest 
coefficients are removed. Moreover, it can be assumed 
that chrominance channels (C& and C r ) contain less 
visual information and therefore more coefficients can be 
removed from these channels than from the luminance 
signal (Y). Experimental results carried out with differ- 
ent combination of YCbC r coefficients, demonstrate that 
luminance related attributes have more relevance than 
chrominance related ones. The selected parameters for 
this example result in 202 attributes per channel. We will 
select the first 104 ones for Y and 60 for each C&CV signal, 
thus reducing the total amount of attributes to 224. 

The best performance has been obtained by applying 
a SVM classifier (precision = 84.8% in a k-fold 10 test). 
117 attributes have been selected for the final feature 
space. The information provided by the confusion matrix 
(Table [2]) can be represented graphically in order to 
represent the qualitative behaviour of the method. We 
have selected the Force Atlas 2 algorithm (53) to distribute 
the classes on a 2D plane. Force Atlas 2 establishes a 
force directed layout simulating a physical system where 
nodes (classes) repulse each other and edges apply an 
attraction force. For the method presented in this paper, 
the repulsion force is adjusted to scale the layout to a 
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Corel 1000 dataset confusion matrix 
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convenient size while edge forces are represented by the 
error information stored in the confusion matrix. Thus, 
the attraction force of two nodes will be proportional to 
the mutual miss-classifications. 

For the Corel 1000 dataset, it can be observed in Figure 
[8] that Dinosaurs, Flowers and Horses are clearly separated 
from the rest of the categories. This result can also be 
verified via the precision and recall data. Precision is 
above 95% and there are very few instances for other 
classes estimated as Dinosaurs, Flowers or Horses. 

A deeper analysis of class distribution can be per- 
formed by removing the aforementioned three cate- 
gories. Figure [9] shows that there is a group formed by 
Beach, Mountains and Architecture and other by Africans 
which links to Elephants and Food although these two are 
not directly connected. 

Figure |10| shows an example of one of the classification 
errors. As it can be seen, the presence of vegetation and 
trees associates the image to the Mountain class even 
if it belongs to Architecture. These semantic overlays of 
Corel 1000 categories put some visually similar images 
in different classes. 

Comparing the obtained results with other feature ex- 
traction approaches (Mean-Shift and Gaussian Mixtures 
based on Weighted Color Histograms |12|, Reduced 
Feature Vector with Relevance Feedback [54 1 and SIFT 
based Gaussian Naive Bayesian Network |55|), DITEC 
shows the best performance for most categories (Figure 
|TT) and the highest mean precision value. 

3.2 Case study 2: Geoeye satellite imagery 



The Geoeye |56j dataset is composed by 1003 multi- 
resolution patches of Digital Globe Earth observation 
satellite imagery at ~ lm spatial resolution. The dataset 
is categorized in 7 classes corresponding to different 
geographical locations (Figure [12]). All the resolutions 
have been processed with the same trace transform 
parameters. 

During the data mining process Bayesian networks 
have shown the best performance, reaching a precision 
of94.51% in a k-fold 10 test. The final dimensionality of 
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Fig. 8. Distance among classes in the Corel 1 000 dataset 
according to misclassified instances. 



the feature space has been reduced to 61 attributes. Table 
H] shows the confusion matrix of the classification results. 

TABLE 3 
Geoeye dataset confusion matrix 
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Applying the Force Atlas 2 method to Geoeye clas- 
sification errors, we obtain the distribution shown in 



Buses 




Elephants 



Beach 



Mountains 



Fig. 9. Distance among most inter-related classes in the 
Corel 1000 dataset according to misclassified instances. 



Figure 13 It can be observed that Risalpur and Rome 
are the categories with the highest mutual similarity (2 
cities). The Davis-Monthan aircraft boneyard has shown 
a remarkable similarity with Risalpur due to the fact that 
wide areas of bare soil are a common element in both 
Risalpur and Davis. 

The Midway atoll is the most distinguishable category 
of the Geoeye dataset. It has special color features and 
textures and shapes are also singular within the dataset. 
All these characteristics have been successfully detected 



10 




Fig. 10. Corel 1000 picture corresponding to class Archi- 
tecture and classified as Mountain 
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Fig. 11. Corel 1000 precision results with different fea- 
ture extraction algorithms. WHMSGM: Mean-Shift and 
Gaussian Mixtures based on Weighted Color Histograms, 
FVR: Reduced Feature Vector with Relevance Feedback, 
Gaussian NBN: SIFT based Gaussian Naive Bayesian 
Network. 



by the method (Precision = 100%, Recall = 0.954%). 



4 Conclusion 

We have shown that the trace transform provides highly 
discriminant features for context categorization purposes 
that can be encoded as considerably short feature vec- 
tors. We have presented the geometrical constraints of 
the trace transform that can be optimized to efficiently 
represent the information contained in the original im- 
ages. The dimensionality reduction in terms of mean and 
kurtosis value pair of frequencial coefficients results in a 
very robust set of features in terms of precision. For most 
resolution (n^,n p ,L(n)) settings maintaining acceptable 



coverage, homogeneity and redundancy conditions, pre- 
cision has demonstrated to keep around 82% for the 
Corel 1000 dataset and 92% for Geoeye. 

Moreover, the method has successfully identified vi- 
sual similarities within the datasets, and as seen in the 
validation section, some incorrectly classified instances 
are in fact visually similar to those pointed out by the 
classifier. The error analysis has also shown some se- 
mantic proximity between visually similar categories, a 
fact that can be used for context modeling and automatic 
ontology building. 
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